Entropy Based Adaptive Outlier Detection Technique for Data Streams
نویسنده
چکیده
Outlier detection in data streams is an immensely enthralling problem in many application areas such as network intrusion detection, faulty sensor detection, fraud detection in online financial transactions etc. Majority of existing outlier detection techniques have been mainly designed for static datasets and require a global view and multiple scans of data which is not feasible in case of streaming data. In this paper, we propose an entropy based outlier detection technique for streaming data exploiting the fact that presence of an anomalous data object highly increases the entropy of normal data clustering. It maintains clusters of streaming data and finds change in its entropy on incoming data object. If increment in entropy is very large then the data object is marked as candidate outlier and its anomalous behaviour confirmed over multiple sliding windows to minimize the false alarms. The proposed method is incremental and dynamically updates clustering structure and entropy statistics to deal with heavy volume and concept evolution of data streams. The proposed scheme has been evaluated on both synthetic and real world data. Experimental results prove its effectiveness on following performance measures: outlier detection rate, false alarm rate and running time.
منابع مشابه
An Adaptive Outlier Detection Technique for Data Streams
This work presents an adaptive outlier detection technique for data streams, called Automatic Outlier Detection for Data Streams (A-ODDS), which identifies outliers with respect to all the received data points (global context) as well as temporally close data points (local context) where local context are selected based on time and change of data distribution.
متن کاملContinuous Adaptive Outlier Detection on Distributed Data Streams
In many applications, stream data are too voluminous to be collected in a central fashion and often transmitted on a distributed network. In this paper, we focus on the outlier detection over distributed data streams in real time, firstly, we formalize the problem of outlier detection using the kernel density estimation technique. Then, we adopt the fading strategy to keep pace with the transie...
متن کاملDBOD-DS: Distance Based Outlier Detection for Data Streams
Data stream is a newly emerging data model for applications like environment monitoring, Web click stream, network traffic monitoring, etc. It consists of an infinite sequence of data points accompanied with timestamp coming from external data source. Typically data sources are located onsite and very vulnerable to external attacks and natural calamities, thus outliers are very common in the da...
متن کاملA Novel Approach for Outlier Detection using Rough Entropy
Outlier detection is an important task in data mining and its applications. It is defined as a data point which is very much different from the rest of the data based on some measures. Such a data often contains useful information on abnormal behavior of the system described by patterns. In this paper, a novel method for outlier detection is proposed among inconsistent dataset. This method expl...
متن کاملEfficient Algorithms for Mining Data Streams
Data streams are ordered sets of values that are fast, continuous, mutable, and potentially unbounded. Examples of data streams include the pervasive time series which span domains such as finance, medicine, and transportation. Mining data streams require approaches that are efficient, adaptive, and scalable. For several stream mining tasks, knowledge of the data’s probability density function ...
متن کامل